The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs

نویسندگان

  • Hoang-Vu Dang
  • Bertil Schmidt
چکیده

Existing formats for SparseMatrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse matrices. We compared SCOO performance to existing formats of the NVIDIA Cusp library. Our resutls on a Fermi GPU show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices. Furthermore, comparison to a Sandy-Bridge CPU shows that SCOO on a Fermi GPU outperforms the multi-threaded CSR implementation of the Intel MKL Library on an i7-2700K by a factor between 5.5 and 18.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs

Numerical methods in sparse linear algebra typically rely on a fast and efficient matrix vector product, as this usually is the backbone of iterative algorithms for solving eigenvalue problems or linear systems. Against the background of a large diversity in the characteristics of high performance computer architectures, it is a challenge to derive a cross-platform efficient storage format alon...

متن کامل

CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations

Article history: Received 10 November 2012 Received in revised form 2 September 2013 Accepted 15 September 2013 Available online 8 October 2013

متن کامل

Optimizing Sparse Matrix-vector Multiplication Based on Gpu

In recent years, Graphics Processing Units(GPUs) have attracted the attention of many application developers as powerful massively parallel system. Computer Unified Device Architecture (CUDA) as a general purpose parallel computing architecture makes GPUs an appealing choice to solve many complex computational problems in a more efficient way. Sparse Matrix-vector Multiplication(SpMV) algorithm...

متن کامل

New Sparse Matrix Storage Format to Improve The Performance of Total SPMV Time

Graphics Processing Units (GPUs) are massive data parallel processors. High performance comes only at the cost of identifying data parallelism in the applications while using data parallel processors like GPU. This is an easy effort for applications that have regular memory access and high computation intensity. GPUs are equally attractive for sparse matrix vector multiplications (SPMV for shor...

متن کامل

Yet another Hybrid Strategy for Auto-tuning SpMV on GPUs

Sparse matrix-vector multiplication (SpMV) is a key linear algebra algorithm and is widely used in many application domains. Besides multi-core architecture, there is also extensive research focusing on accelerating SpMV on many-core Graphics Processing Units (GPUs). SpMV computations have many indirect and irregular memory accesses, and load imbalance could occur while mapping computations ont...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012